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Abstract 

This  paper  presents  an  SQP-based  interior  point  technique  for  solv¬ 
ing  the  general  nonlinear  programming  problem  using  trust  region 
globalization  and  the  Coleman-Li  scaling.  The  SQP  subproblem  is 
decomposed  into  a  normal  and  a  reduced  tangential  subproblem  in 
the  tradition  of  numerous  works  on  equality  constrained  optimization, 
and  strict  feasibility  is  maintained  with  respect  to  the  bounds.  This 
is  intended  to  be  an  extension  of  previous  work  by  Coleman  &  Li  and 
Vicente.  Though  no  theoretical  proofs  of  convergence  are  provided, 
some  computational  results  are  presented  which  indicate  that  this  al¬ 
gorithm  holds  promise.  The  computational  experiments  have  been 
geared  towards  improving  the  semi-local  convergence  of  the  algorithm; 
in  particular  high  sensitivity  of  the  speed  of  convergence  with  respect 
to  the  fraction  of  the  trust  region  radius  allowed  for  the  normal  step 
and  with  respect  to  the  initial  trust  region  radius  are  observed.  The 
chief  advantages  of  this  algorithm  over  primal-dual  interior  point  al¬ 
gorithms  are  better  handling  of  the  ‘sticking  problem’  and  a  reduction 
in  the  number  of  variables  by  elimination. of  the  multipliers  of  bound 
constraints. 


*This  research  was  partially  supported  by  the  Dept,  of  Energy,  DOE  Grant  DE-FG03- 
95ER25257  and  by  the  National  Aeronautics  and  Space  Administration  under  NASA 
Contract  No.  NASl-19480  while  the  author  was  in  residence  at  the  Institute  for  Com¬ 
puter  Applications  in  Science  and  Engineering  (ICASE),  NASA  Langley  Research  Center, 
Hampton,  VA  23681-0001. 
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1  Introduction 


This  paper  focuses  on  an  algorithm  for  solving  the  general  nonhnear  pro¬ 
gramming  problem  in  the  form 


min  f(x) 
s.t.  h{x)  =  0^ 

a  <  X  <  b  (NLP) 

where  f  :  f-*  11,  h  :  TZ^  TZ^  are  twice  continuously  differentiable 

mappings  and  m  <  n.  Recently  at  least  two  excellent  pieces  of  work  on  using 
trust  region  globalization  in  interior  point  techniques  for  solving  nonlinear 
optimization  problems  have  appeared.  Coleman  and  Li  in  [1]  and  [2]  discuss 
interior  point  algorithms  for  solving  the  bound-constrained  problem 

min  f{x) 

X 

s.t.  a  <  X  <  b 

Vicente  in  his  Ph.D.  thesis  [3]  extensively  deals  with  an  extended  form  of 
the  above  problem  with  equality  constraints  having  a  special  structure,  and 
bounds  only  on  the  control  variables  u; 

min  f{y,  u) 

s.t.  C{y,u)  =  0 
a  <  u  <  b 

where  dim{C)  =  dim{y). 

Both  [1]  and  [3]  rigorously  prove  global  convergence,  second-order  con¬ 
vergence  under  reasonable  assumptions^  and  local  q-quadratic  convergence 
in  their  respective  settings.  This  article  draws  on  the  results  and  accrues 
on  the  experiences  of  the  two  aforementioned  works  and  extends  their  al¬ 
gorithms  to  handle  problem  (NLP).  Though  no  proofs  are  provided,  the 

^This  takes  into  account  inequality  constraints  5(1)  <  0  as  well  since  inequalities  can 
be  converted  to  equalities  by  adding  slack  variables  and  imposing  nonnegativity  bounds 
on  the  slacks. 

^Second  order  convergence  denotes  convergence  to  a  point  satisfying  second  order  nec¬ 
essary  conditions  for  a  local  ininimum. 
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algorithm  described  here  has  been  implemented  and  computationally  tested 
and  found  to  be  reasonably  successful;  more  testing  is  in  progress.  An  im¬ 
portant  observation  made  in  this  paper  is  that  the  trust  region  subproblems 
in  the  scaled  and  unsealed  steps  are  not  equivalent,  contrary  to  Coleman  & 
Li’s  claim  in  [1],  and  this  fact  is  proved  in  the  appendix. 

2  Preliminaries 

The  proposed  algorithm  here,  like  its  predecessors  in  Coleman  and  Li  [1] 
and  Vicente  [3],  starts  at  a  point  strictly  feasible  with  respect  to  the  bounds 
on  the  variables  and  produces  iterates  that  are  strictly  feasible  with  respect 
to  the  bounds  (i.e.  ‘in  the  interior’).  The  steps  in  the  algorithm,  as  in  the 
above  two,  can  be  motivated  by  applying  Newton’s  method  to  a  very  spe¬ 
cial  statement  of  neccesary  condition  for  optimality  for  problem  (NLP),  as 
stated  in  the  following  exposition: 

Let  l{x,  A)  =  f{x)  +  X'^h(x)  (which  is  not  the  Lagrangian  for  problem 
(NLP),  since  the  Lagrangian  would  also  involve  the  multipliers  correspond¬ 
ing  to  the  bound  constraints).  Define  the  diagonal  scaling  matrix  D{x)  as 

D(x)  =  diag{d{x)). 


where 


di{x) 


Vbi  -  Xi,  if  {Vxl{x,  A)),  <  0  and  bi  <  oo 
<  y/xi  —  G,-,  if  (Va;/(x,  A)),- >  0  and  a,- >  — OO, 
1,  otherwise 


Then  {x*,  A*)  satisfy  first  order  necessary  conditions  for  optimality  of  prob¬ 
lem  (NLP)  if  and  only  if 


D2(a;*)V^/(a:*,A*)  =  0 

h{x*)  =  0  (1) 

The  proof  for  the  above  becomes  obvious  on  a  closer  scrutiny  of  the  Karush- 
Kuhn- Tucker  conditions  for  problem  (NLP),  which  requires  the  existence  of 
(a^*,A*,//*,/u^)  satisfying 


VJ{x\X*)-^i:  +  nt  =  0 
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h{x*)  =  0, 


where  and  ij.1  are  vectors  of  non-negative  multipliers  of  the  finite  bound 
constraints,  satisfying  the  complementarity  conditions: 

<(a,-xn  =  0 

=  0  (2) 

for  aU  i  corresponding  to  Xi  with  finite  bounds. 

Keeping  in  mind  that  a  positive  or  negative  value  of  (Vj;/(x*,  A*)),-  can 
only  be  canceled  in  the  above  by  respectively  a  positive  Hai  or  a  positive  fif,i , 
complementarity  makes  the  equivalence  between  (1)  and  (2)  obvious. 

Quite  clearly,  other  possible  choices  of  D{x)  exist  which  permit  the  above 
equivalence,  however  the  above  choice  with  square  roots  of  the  distances  of 
the  variables  from  the  bounds  permits  local  q-quadratic  convergence  for  the 
treatments  in  Coleman  and  Li[l]  and  Vicente  [3]  in  spite  of  the  nonsmooth¬ 
ness  of  D‘^{x)  and  hence  is  our  choice  here. 


2.1  Newton  step 

Let  i]{x)  be  a  vector  such  that 


Vi{x) 


dXi 


,i  —  1 , . . . 


n 


The  above  definition  of  7/(x)  is  equivalent  to' 


77i(x) 


-1,  if  (Va,/(x,A))i  <  0  and  6,  <  oo 
^  1,  if  (Va;/(x,  A))i  >  0  and  Cj  > -oo, 
0,  otherwise 


(3) 


Then  it  can  be  shown  that  a  Newton  step  in  (x.  A)  on  nonlinear  system 
(1)  is  given  by 

[I>^(x)V^/(x,  A)  -I-  diag{Vxl{x,  A))diap(7?(x))] Ax  -|-  Z?^(x)Va;h(x) AA 

=  -D‘^{x)VJ{x,X)  (4) 

Vj;/i(x)^Ax  =  — h(x) 
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Multiplying  throughout  on  the  left  of  the  first  equation  by  D  the  above 

system  can  be  equivalently  written  as 


[Vll{x,  X)+diag{Vxl{x,  X))diag{T]{x))D  ^(x)]Ax+Vi:fi(x)AA  =  —Vxl{x,X) 

T  (S) 

Vxh{xf  Ax  =  -h{x) 

Introducing  the  scaled  step  Ax  =  D~^{x)Ax,  the  above  system  takes 
the  form 

[V^/(x,  A)£>{x)  +  diag{Vxl{x,  A))dia5f(?;(x))Z)“^(x)]  Ax  +  Vj;h(x)AA 

= -i?2(x)V^/(x,A) 
[I>(x)Vj:h(x)]^Ax  =  — /l(x) 

Multiplying  throughout  on  the  left  of  the  first  equation  by  D{x)  and 
remembering  that  diagonal  matrix  multiplication  commutes,  we  arrive  at 
the  system 

[(Z?(a:)  \)D{x)  +  diagiV ^  A))diajf(77(a:))]  Ax  +  D{x)V xh{x)l\X 

=  -D{x)Vxl{x,\) 

[D{x)V  xh{x)]^  Ax  =  -h{x)  (6) 

3  Trust  region  SQP  formulation 

The  trust  region  subproblems  based  on  the  above  Newton  steps  can  be 
formulated  in  terms  of  the  scaled  step  s  or  in  terms  of  the  unsealed  step; 
we  shall  do  both  and  compare  their  performances.  Even  though  Coleman 
&  Li  in  [1],  pg.  422,  state  that  the  two  formulations  are  equivalent,  we 
prove  that  the  steps  generated  by  the  two  formulations  can  be  completely 
different  when  the  Newton  step  does  not  lie  within  the  trust  region  (see 
Appendix  A).  In  fact,  this  may  be  the  only  explanation  for  their  difference 
in  computational  performances.  Even  though  the  scaling  is  only  a  linear 
transformation,  it  is  introduced  in  a  nonlinear  setting,  and  hence  creates  a 
difference.^ 

^This  is  similar  to  the  phenomenon  that  the  steps  generated  by  the  log  barrier  method 
in  primal-dual  interior  point  techniques  and  those  generated  by  the  damped,  perturbed 
KKT  formulation  never  agree,  as  shown  by  Tapia  (see  Tech  Report  TR92-40,  Dept, 
of  Computational  &  Applied  Math,  Rice  University  by  El-Bakry,  Tapia,  Zhang  and 
Tsuchiya). 
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3.1  Scaled  step  subproblems 

System  (6)  suggests  the  following  trust-region  SQP  ^  subproblem  in  terms 
of  the  scaled  step  s  (||.||  denotes  the  h  norm  all  throughout): 

min  ^-s^Hs  -I-  [D{x)'Vxl{x,  A)]^J 

s  2 

s.t.  [D{x)Vxh{x)]^s  +  h{x)  =  0 

Pll  <  ^ 

where 

H  =  D{x)Vll{x,  X)D{x)  -1-  diag{Vxl{x,  X))diag{Tj{x)) 

It  is  well-known  that  if  the  trust  region  constraint  on  the  length  of  the  step 
s  is  not  active,  the  solution  to  the  above  in  the  primal  and  dual  variables 
is  exactly  the  Newton  step  in  (x,A)  given  by  system  (6).  However,  this 
subproblem  may  weU  be  infeasible,  and  this  is  taken  care  of  by  a  bilevel  step 
decomposition  approach,  the  description  of  which  follows. 

3.1.1  Step  decomposition 

In  the  tradition  of  numerous  works  on  equality  constrained  optimization 
works  in  recent  years  (see,  for  example,  [8],  [10],  [4],  [3],  to  name  only  a 
few),  we  adopt  a  bilevel  approach  and  decompose  the  trust  region  SQP 
subproblem  above  into  the  two  subproblems  below: 

min  ||[X>(a:)Va;h(a;)]^Sn  + 

Sn 

s.t.  ||s„||  <  vS, 

where  i/  e  (0, 1),  usually  chosen  to  be  G  [0.5;  0.8],  followed  by 
min  Hs  -f  [D(x)'Vxl{x,  A)]^J 

St  2 

s.t.  [D{x)'Vxh{x)fs  =  [D{x)'V  xh{x)f  Sn 

Pll<^, 

^Sequential  Quadratic  Programming 
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where  s  is  the  full  step  5  =  5^  +  The  linear  equality  constraints  ensure 
that  the  improvement  in  the  model  for  attaining  feasibility  is  not  disturbed. 
The  latter  subproblem  can  be  written  in  terms  of  step  St  in  the  more  useful 
form  below: 

min  Hst  +  [D{x)VJ{x,  A)  +  Hsn]'^st 

St  Z 

s.t.  [D{x)Vxh(xyf'st  =  0 

ll».||  <  -  IN„IP. 

Then  x  is  updated  as 


X  f-  X+D{x){.St  +  Sn). 

The  first  of  these  is  usually  called  the  normal  or  vertical  subproblem, 
while  the  latter  is  referred  to  as  the  tangential  or  horizontal  subproblem. 
It  is  again  clear  that  if  the  trust-region  constraint  is  inactive  in  both  the 
subproblems,  then  the  step  +  St  is  exactly  the  same  as  the  Newton  step 
in  X  given  by  system  (6).  If  the  trust  region  radius  is  too  small  to  al¬ 
low  the  earlier  ‘undecomposed’  subproblem  to  be  feasible,  the  normal  sub¬ 
problem  has  the  trust  region  constraint  active.  Observe  how  not  having 
[D(x)V xhlx)]"^ Sn  +  h(x)  =  0,  i.e.,  not  having  the  earlier  ‘undecomposed’ 
subproblem  feasible,  does  not  affect  the  algorithm  any  more. 

It  is  important  to  note  here  that  this  bilevel  aproach  does  not  actu¬ 
ally  give  us  a  step  in  A.  So  we  update  lambda  using  a  projection  formula. 
Remembering  that  at  optimality 

D\x*)VJ{x*,X*)  =  0 

i.e.,  D\x*)VJ{x*)  -I-  D^{x*)^^h{x*)X*  =  0 
we  take  A  at  each  iteration  to  be  the  least  squares  solution  of 

D^{x)V^h{x)X  =  -D^{x)V:,f{x) 


or,  in  other  words, 

A  =  -[iD\x)W,h{x)f{D\x)VM^))]-\D\x)VMx)fD\x)V,f(x). 

(7) 
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This  computation  is  elaborated  on  later. 

Following  common  practice  in  equality  constrained  optimization,  we  do 
not  actually  solve  the  tangential  subproblem  with  the  equality  constraints 
[X>(x)  =  0  but  instead  force  the  step  st  to  be  in  jVr([I)(x)V2rh(x)]^), 

the  nullspace  of  the  scaled  Jacobian  of  h{x)  so  that  the  equality  constraints 
are  automatically  satisfied.  More  specifically,  the  QR  factorization  of  [D^{x)Va:h{x)f 
is  computed,  which  also  helps  with  the  multiplier  update,  and  a  basis  Z{x) 
for  the  nullspace  of  [D‘^{x)^ is  extracted  from  the  last  n-m  columns 
of  Q.  Note  that  [D\x)Vxh{x)f  ,[D{x)Vxh{x)]'^  and  have  the 

same  nullspace,  since  a  <  x  <  b  makes  D{x)  necessarily  nonsingular.  The¬ 
oretical  details  on  this  nullspace  approach  including  the  issue  of  the  differ- 
entiablity  of  the  matrix  Z{x)  can  be  found  in  Goodman  [7]  . 

Once  Z{x)  is  obtained,  the  step  St  is  forced  to  satisfy 


St  =  Z{x)st. 


Substituting  this  in  the  tangential  subproblem  described  earlier  yields 
min  hjZ{xfHZ{x)st  +  [Z{xf  {D{x)VJ{x,  A)  Hs^)fst 

St  2 

S.t.  \\Z{x)St\\  <  ^6^  -  IlSnIP. 

Since  the  columns  of  Z{x)  are  orthonormal,  ||^(x)s/||  =  ||st||  yielding 
the  following  problem  which  we  shall  henceforth  refer  to  as  the  reduced 
tangential  subproblem 

min  l-sj Z{x)^HZ{x)st  -1-  [Z{x)^ {D{x)V xl{x ,  A)  -f  Hsn)fh 

St  2 

Now  a:  is  updated  as  x  ^  x  +  5,  where 

5  =  D{x){sn  +  Z{x)st)^ 

Z^Zst  =  sf  St,  since  Z^Z  is  the  n  —  m  x  n  —  m  identity. 
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3.2  Unsealed  step  subproblems 

The  trust  region  subproblem  in  terms  of  the  unsealed  step  can  be  derived  in 
one  of  three  ways  (all  of  which  may  not  foUow  the  formulation  of  trust-region 
theory): 

Based  on  the  Newton  steps  directly 

Based  on  the  Newton  step  in  (5),  the  trust  region  subproblem  in  the 
unsealed  step  can  be  written  directly  as 

min  \-s^Hs  d-  Vxl{x,  Xf  s 

s  2 

s.t.  Vxh{z)^s  +  h{x)  =  0 

Ikll  < 

where 

ii  =  X)  +  diag{'Vxl{x,  \))diag{T){x))D~\x). 

Decomposing  the  step  s  as  before,  the  normal  and  tangential  subprob¬ 
lems,  respectively,  turn  out  to  be 

min  ||Va;h(a;)^s„  +  /i(a:)|p 

$71 

S.t.  ||s„||  <  u8 

and 

min  hi  Z{xfHZ{x)st  +  [Z(.t)^(V,,/(x,  A)  +  Hsn)fst 
z 

s.t.  ||s,||  < 

Then  the  fuU  step  s  becomes 


s  =  -t-  Z{x)$t 

This  formulation  will  generally  be  referred  to  as  unsealed  version  1 

The  untraditional  and  possibly  unwanted  feature  of  this  is  that  the  nega¬ 
tive  gradient  step  for  the  horizontal  subproblem  is  along  —Vxlix,  A)  and  not 
along  -D^{x)Vxl{x,  A),  which  is  what  the  KKT  conditions  require.  This  is 
dealt  with  in  the  following  formulation. 
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Based  on  the  original  Newton  step  in  (4),  only  the  horizontal  sub¬ 
problem  turns  out  a  little  different: 

min  lsf  Z{xfD\x)HZixyst  +  [Z{xfD^{x){VJ{x,  A)  Hsn)fst 

St  2 

s.t.  ptil  <  y/^2  -  ||5„||2. 

The  negative  gradient  step  now  does  point  along  -D'^{x)Vxlix,\),  however 
the  symmetricity  of  the  second  order  term  is  lost,  which  is  also  untraditional 
in  trust  region  formulations. 

This  formulation  will  be  referred  to  as  unsealed  version  2 

Restoring  the  scale  in  the  scaled  step  subproblems 

Yet  a  third  formulation  can  be  obtained  indirectly  by  substituting 

Sn  ^  T) 

and  Zst  ^  D~^{x)Zst 

in  the  scaled  step  subproblems.  The  normal  and  tangential  subproblems 
thus  obtained  are: 

min  l|Va;h(x)^s„  -I-  /i(x)|p 

Sn 

s.t.  ||T)”^(a;)s„||  < 

and 

min  lsJZixfD-\x)HD-yx)Z{x)st+[Z{xfiVJ{x,  X)+D-'^Hsn)fst 

St  2 

s.t.  \\D-\x)Zst\\  <  -  ||5„||2, 

where,  as  defined  before, 

H  =  D{x)V'll{x,\)D{x)  + diag{Vxl{x,X))diag{'q{x)). 

Also  observe  that 

D-^{x)HD-\x)  =  H. 

This  formulation  is  along  the  lines  of  Coleman  &  Li  and  it  is  a  formulation 
of  this  type  that  they  use  to  prove  their  convergence  results.  However,  this 
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is  not  equivalent  to  the  scaled  step  subproblem  in  the  sense  that  the  steps 
obtained  may  not  be  identical  when  the  trust  region  constraint  is  binding 
(see  Appendix  A). 

This  formulation  will  be  referred  to  as  unsealed  version  3. 

Mixed  unsealed  version 

An  observation  is  in  order  regarding  the  vertical  subproblem:  if  Sn  is 
the  Newton  step  for  the  vertical  subproblem  in  the  scaled  step  and  is 
the  Newton  step  for  vertical  subproblem  in  unsealed  versions  1  or  2,  then  in 
general, 

D{x^Sn  ^  Sn- 

Proof: 

The  scaled  Newton  step  satisfies 

[D{x)V:,h{x)fsn  =  -h{x); 

i.e., 

Vj:h(a:)^(D(a;)J„)  =  -h{x). 

The  unsealed  (versions  1  and  2)  Newton  steps  satisfy 

Vxh{x)'^Sn  =  -h{x). 

The  preceding  yield 

Vi:/l(x)  (^D{x^Sti  Sn)  —  0, 

which  necessarily  implies  D{x)Sn  —  Sn  =  0  only  if  Vxh{x)'^  has  full  column 
rank,  which  is  never  true  because  in  our  very  problem  statement  we  have 
m  <  n  and  Vxh{xf  € 

{QED) 

Our  computational  experiments  suggest  that  this  fact  could  be  respon¬ 
sible  for  poor  performance  of  the  scaled  version  (or  even  unsealed  version 
3,  which  is  derived  from  the  scaled  version)  in  some  problems,  which  mo¬ 
tivates  the  following  mixed  unsealed  version.  This  version  has  the  normal 
subproblem  as  in  unsealed  version  1  (or  2)  and  the  tangential  subproblem 
as  in  unsealed  version  3,  i.e., 

min  ||V2;/i(x)^s,i  +  h{x)\\'^ 

Sn 
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S.t.  Il^nll  <  I'S, 

followed  by 

min  —sJZ{x)^D~^{x)HD~^{x)Z{x)st+[Z{x)^(VxK^:^)'^D  ^Jlsn)]^st 

St  2 

5.^.  \\D-\x)Zst\\  <  _  ||s^||2. 

3.3  Interiorization 

After  calculating  the  step  s,  it  is  scaled  by  the  damping  parameter  r  to 
ensure  that  the  updated  x  stays  strictly  feasible  with  respect  to  the  bounds. 
More  precisely,  let  us  define 

f  if  tti  >  -00  and  Si  <  0 

*  1  1,  otherwise 


Vi  = 


i 


Si  ’ 


1, 


if  bi  <  00  and  s;  >  0 
otherwise 


Then 


r  =  crmin(l,mm{ui,Uj}), 

t 


(8) 


where  a  =  0.99995  forces  strict  feasibility  of  x  with  respect  to  the  bounds. 


However,  if  some  Xj  is  too  close  to  a  bound  and  if  the  corresponding  St 
violates  that  bound,  it  could  happen  that  the  corresponding  w,  or  v,  is  too 
small,  making  the  damping  parameter  very  small,  resulting  in  very  short 
steps.  To  prevent  this  from  happening,  we  choose  a  tolerance  k  and  impose 
the  following  conditions: 


if  Ui  <  «,  Si  =  0,  Ui  <—  1 
if  Vi  <  K,  Si  =  0,  Vi  <—  1. 

In  our  implementation,  we  chose  k  =  10"®.  Thus  the  interiorized  step  is 

S  TS. 
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3.4  Step  acceptance  criterion 

As  is  common  in  trust  region  algorithms,  whether  a  step  is  accepted  or  not 
is  based  on  the  accuracy  of  the  actual  decrease  in  a  merit  function  relative 
to  the  predicted  decrease.  Let  us  denote  by  I  the  commonly  chosen  I2  norm 
augmented  Lagrangian  merit  function  in  equality  constrained  optimization: 

/(x,  A,  p)  =  /(x)  +  A^h(x)  +  p||h(x)|p, 

where  p  is  a  positive  number  bounded  below  and  is  updated  at  each  iter¬ 
ation.  Most  standard  convergence  results  require  {pk}  to  form  a  nonde¬ 
creasing  sequence.  However,  El-Alem  [5],  [6]  has  proved  convergence  for  a 
nonmonotonic  update  of  p. 

Let  x"*"  =  X  -|-  s,  A+  be  the  A  updated  according  to  the  projection  for¬ 
mula  (7).  Adapting  from  Coleman  and  Li  [1],  we  define  actual  reduction 
as 

ared  =  l{x,X,p)  —  l{x'^,X'^,p)  —  ^s^[diag{'Vxl{x,X))diag(Tj{x))D~'^{x)]s. 

Zd 

The  last  extra  term  accounts  for  the  fact  that  I  is  not  exactly  the  augmented 
Lagrangian  for  our  general  NLP,  which  also  has  bounds  on  x.  It  should  be 
noted  that  Vicente  in  [3]  proves  all  his  results  without  this  term  in  his  ex¬ 
pression  for  ared. 

Recalling  the  Newton  step  in  (5)  on  the  original  KKT  system,  the  pre¬ 
dicted  decrease  in  the  quadratic  model  is  expressed  as 

pred  =  — [Vj;/(x,  A)]^s  -  ^s^[V^/(x,  A) -|-  didgCV xl{x,  X))diag{T])D~\x)]s— 

{VxKxfs  +  h{x)fAX  -f  p{\\h{x)f  -  ||V^h(xf  s  -h  h(x)lp) 
where  A  A  =  A"^  —  A. 

The  step  is  usually  rejected  when  <  10“^,  and  accepted  otherwise. 
However,  computational  experience  shows  that  approximate  solutions  to  the 
trust  region  subproblems  can  often  result  in  negative  values  of  pred.  In  such 
a  case  the  step  could  be  admitted  even  if  ared  <  0.  To  prevent  this,  our 
implementation  requires  ared  >  0  in  addition  to  >  10~^  for  step  accep¬ 
tance.  The  trust  region  radius  is  updated  according  to  a  rule  based  on 
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described  later. 


A  startling  observation  made  in  course  of  our  computational  experiments 
was  that  the  steps  generated  by  our  algorithm  seemed  to  alternate  (we  don’t 
necessarily  mean  exactly  every  other  step)  between  improving  either  the 
merit  function  l{x,  A,p)  or  the  merit  function  defined  by 


^(x,A)  = 


D\x)VJ{x,X) 

h(x) 


which  is  a  reasonable  merit  function  considering  that 


D\x)VJix,X) 

h{x) 


=  0 


is  the  system  of  nonlinear  equations  which  we  aim  to  solve.  Computational 
observations  suggest  that  except  in  the  neighborhood  of  the  solution,  both 
the  merit  functions  rarely  exhibit  improvement  simultaneously.  So  using 
only  the  I{x,  X,p)  merit  function  could  lead  to  rejection  of  steps  which  actu- 
aly  make  progress  towards  the  solution.  In  our  implementation,  we  declare 
a  step  as  acceptable  when  it  satisfies  ^  >  10“^  for  any  one  of  the  two 
merit  functions.  This  strategy  results  in  a  huge  reduction  in  the  number  of 
iterations,  and  hence  function  and  gradient  evaluations.  Actual  reduction 
for  4>{x,  A)  is  obviously 

aredtj)  =  (f>{x,  A)  —  A"*"). 


Remembering  that 


<p{x,X) 


D\x)VJ{x,X) 

h{x) 


predicted  decrease  can  be  defined  as 

pred^l^  =  4>{x,  A)  -  +  D^(x)VJ{x,  A)|p  -  ||V,h(x)^5  +  h{x)f, 

where 

Ho  =  D'^(x)Wll{x,X)  +  diag(VJ{x,X))diag{T}{x)). 
Alternately,  since 


■  D\x)VJ{x,X)  ■ 

Ho  Vxh(x) 

h{x) 

^:,h{xf  0 
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an  expression  for  pred  (using  only  a  first  order  approximation)  could  be 

h{x) 

In  our  computation,  we  take  pred^  =  pred^^\  unless  pred^J^  <  0,  in 
which  case  we  take  pred^  =  m.djc{pred^^\pred^^^}. 

4  Further  Computational  Details 

4.1  Solving  the  subproblems 

There  has  been  a  considerable  amount  of  work  on  computing  approximate 
solutions  of  trust  region  subproblems  in  recent  years  (see  for  example  More 
k  Sorensen  [9],  Zhang  [11],  Santos  k  Sorensen  [12],  Sorensen  [13],  Steihaug 
[14]).  Standard  convergence  results  usually  require  the  computed  step  to 
satisfy  a  fraction  of  Cauchy  decrease  or  a  fraction  of  optimal  decrease  in  the 
merit  function,  the  latter  being  computationally  more  expensive  but  provid¬ 
ing  stronger  results.  Introductory  material  on  this  can  be  found  in  Dennis 
k  Schnabel  [15]. 

Our  forthcoming  implementation  uses  More  and  Sorensen’s  subroutine 
DGQTPAR  described  in  [9]  to  compute  the  solution  for  the  reduced  tangen¬ 
tial  subproblem.  The  chief  advantage  of  this  is  that  it  nxostly  returns  a  step 
satisfying  fraction  of  optimal  decrease  even  if  the  subproblem  is  nonconvex. 
Since  it  might  be  unreasonable  to  expect  a  positive  definite  Z{x)^ H Z{x)^ 
this  property  of  DGQTPAR  is  very  desirable. 

The  normal  subproblem  in  the  scaled  step  can  be  shown  to  be  equivalent 
to  ^ 

min  -s^HSn  + 

S-n  Z 

s.t.||s„||  <  v6, 

where 

H  =  [Dix)V^h{x)][D(x)W^h{x)f, 

7  =  D{x)Vxh{x)h{x). 

®Since  there  is  no  guarantee  that  A"^)  —  VxZ(a:, A'*))^s  >  0  ,  not  even  the 

BFGS  update  is  guaranteed  to  be  positive  definite. 


Ho 

'Vxh{x) 

s 

_  ^xh{xf 

,  0 

AA 

14 


This  subproblem  is  also  solved  using  DGQTPAR.  It  should  be  noted  that 
the  matrix  H  is  positive  semidefinite  and  is  necessarily  singular;  however 
DGQTPAR  can  handle  this  singularity. 

The  normal  subproblem  in  the  unsealed  step  is  exactly  the  same  as  the 
above  with  D{x)  replaced  by  In- 

It  must  be  mentioned  that  the  computational  results  reported  later  were 
generated  using  a  Matlab  implementation,  which  approximates  the  solutions 
to  the  above  trust  region  subproblems  using  dogleg  steps  (see,  for  example, 
Dennis  &  Schnabel  [15]). 

4.2  Role  of  QR  factorization 

The  QR  factorization  of  D^{x)Vxh{x)  plays  an  important  role  in  our  algo¬ 
rithm  and  implementation  and  hence  deserves  special  mention.  Let  the 
QR  factorization  for  D'^{x)Vxh{x)  be  partitioned  as  below’  (recall  that 
m  =  dim{h)) 

F  =  (?(:,!  :m) 

Z  =  Q{:,m+  1  :  n) 

R  =  R{1  :  m,  1  :  m). 

The  columns  of  Y  contain  an  orthonormal  basis  for  the  range  of  D'^{x)V xh{x), 
or  of  Vxh{x),  since  J9^(x)  is  nonsingular  when  a  <  x  <  b,  and  the  columns 
of  Z  contain  an  orthonormal  basis  for  the  nuUspace  of  [Z)^(x)Vj,h(x)]^  or  of 
'Vxh{x)'^.  Further,  R  is  upper  triangular  and  nonsingular  and  rows  m+l  :  n 
of  R  contain  only  zeros.  Hence 

QR  =  YR 

The  use  of  Z  in  eliminating  the  equality  constraints  in  the  tangential  sub- 
problem  has  already  been  discussed;  now  we  discuss  how  the  QR  factoriza¬ 
tion  is  used  in  updating  A  using  projection  formula  (7)  and  in  computing 

Substituting  D'^{x)Vxh{x)  =  QR  =  YR  ,  projection  formula  (7)  takes 
the  form 

A  =  -[R^Y'^YR]-^RFY^D\x)Wxfix) 

^We  assume  that  the  reader  is  familiar  with  Matlab  or  FORTRAN  90  syntax  for  ref¬ 
erencing  matrices. 
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=  -[R^R]-^RVd^{x)VJ{x) 


<=> 

R^RX  =  -Wy'^D^(x)VJ{x). 

Since  R^  is  nonsingular,  the  above  is  equivalent  to 

RX  =  -Y^D\x)W^f{x) 

Since  R  is  upper  triangular,  A  can  be  found  from  the  above  by  simple  back 
substitution  involving  only  O(m^)  operations,  which  is  considerably  eco¬ 
nomic  in  comparison  with  standard  Gaussian  elimination  on  (7). 

An  important  observation  here  is  that  the  computation  of  A"*"  is  depen¬ 
dent  on  D{x'^),  which  itself  depends  on  the  sign  of  (Vj:/(a;+,  A+))i,  and  thus 
requires  an  estimate  of  A+.  In  practice,  D{x)  is  updated  first  and  D{x'^)  is 
obtained  using  the  old  multipliers;  i.e.,  D{x'^)  is  obtained  using  the  signs  of 
the  components  of  Va;/(a;"'‘,  A(*1).  The  QR  factorization  of  is 

then  computed  and  stored  for  use  in  the  next  iteration  and  A"*"  is  obtained 
using  this  new  QR  factorization  as  in  (7).  This  intermediate  QR  factor¬ 
ization  can  be  an  unnecessary  additional  expense  when  steps  get  rejected 
frequently,  but  can  probably  be  reused  in  the  next  step  wherever  the  model 
is  good,  in  particular  in  the  neighborhood  of  a  solution. 

However,  in  practice,  even  this  strategy  can  perform  poorly  and  give 
inaccurate  results,  and  we  need  yet  another  round  of  predictor- corrector  re¬ 
finement  of  D{x)  and  A.  It  has  been  found  that  the  strategy  that  gives 
fewest  iterations  is  finding  A^’^®  first  with  D{x'^),  found  using  A), 

then  updating  Z>(a;+),  using  VJ{x'^,  A^*)),  and  then  updating  the  multiplier 
once  again  using  the  new  D{x'^)  to  finally  find  A+.  However,  this  requires  an 
extra  intermediate  QR  factorization  which  certainly  cannot  be  reused,  but 
is  a  necessary  expense  in  our  opinion.  Approximate  strategies  for  updating 
the  QR  factorization  might  be  helpful  here;  if  one  can  be  devised.  Other 
multiplier  update  formulas  which  avoid  the  QR  factorization  are  also  being 
looked  into. 


for  ‘intermediate  multipliers’. 
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4.3  Trust  region  update 

The  strategy  for  updating  the  trust  region  radius  S  is  as  follows: 


Let  r  = 


ared 
pred  ‘ 


•  If  r  <  10“^  or  ared  <  0  (step  is  rejected) 


Let 


I3i 


'  0.5,  if  ^  <  10 
<  0.25,  if  10  <  ^  <  10“  ; 
0.1,  if  ^  >  10“ 


then  6  =  max{/?i  min(^,  l|s||),^min}- 


else  step  is  accepted. 


•  If  10““  <  r  <  10“^,^  =  niax{0.5^, Ikll) 

•  If  0.01  <  r  <  0.1,  ^  =  max{0.9^,^,„i„} 

•  Ifr>  0.75, 


Let 


'  10,  if  ^  <  10“'“ 

5,  if  10““  <  ^  <  0.1  . 

4,  if  0.1  <  (5  <  100  ’ 

2,  if  ^  >  100 
\  " 


then  6  =  rmn{^2^  1  ^max} 


4.4  BFGS  Approximation 

The  second  order  quantity  V^/(x,  A)  is  estimated  by  the  BFGS  approxima¬ 
tion  (see  Dennis  &  Schnabel  [15]  for  details).  Given  the  current  approxima¬ 
tion  B,  the  updated  can  be  described  as  below: 

j,  =  V,/(x+,A+)-V,/(a:,A+) 


w  =  Bs 


5+  =  5  + 


yy^ 

yTs 


T 

WW^ 

w^s 
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It  can  be  seen  easily  that  if  B  is  symmetric,  so  is  Further,  if  y^s  >  0 
and  B  is  positive  definite,  so  is  B"^ . 

4.5  Penalty  parameter  update 

The  penalty  parameter  p  is  updated  according  a  scheme  similar  to  that  in 
El-Alem[4],  customized  for  our  setting. 

If  pred  >  0.5  p  (IIM^)IP  -  ||V^h(x)s  +  /i(a;)||2) 

0ls6 

+  _  O  [-D^(x)Vj;/(x,  A)]^s  +  ^s^Hs 
^  ~  ||h(x)|12-||V,h(x)s.+  h(x)|P+^’ 

where  p  is  a  fixed  positive  parameter,  chosen  to  be  10  in  our  implementation 
and 

H  =  D\x)Vll{x,  A)  +  diag{VJ{x,  \))diag{'n{x)) 

It  is  not  difficTilt  to  see  that  this  update  yields  a  non-decreasing  sequence  of 
penalty  parameters,  provided  ||h(x)|p  -  ||Va;h(x)s  -I-  h(x)|p  >  0. 

4.6  Scaling  matrix  in  the  implementation 

Vicente  in  [3]  reports  that  the  following  form  of  the  vector  d{x)  is  more 
efficient  in  dealing  with  bounds  that  are  far  from  active  at  the  solution, 
which  is  corroborated  by  a  huge  reduction  in  the  number  of  iterations  in  our 
computational  experience. 

min{l,-v/6i  -  xj,  if  {yj{x,\))i  <  0  and  bi  <  oo 
di{x)  -  <  min{l,^/x,-  -  a,},  if  (Va:f(x,  A)),-  >  0  and  a;  >  -oo, 

1,  otherwise 

Our  computational  experience  suggests  that  the  vector  77(x)  should  remain 
as  defined  in  (3). 

4.7  Stopping  Criterion 

The  stopping  criterion  is  partly  determined  by  the  norm  of  the  stopping 
vector  sv  which  is  defined  as 

svi  =  min{|(V^/(x,  A))i|,4(xi)} 
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The  current  point  is  reported  as  optimal  when 

||sv||  <  n*stoptol  and  ||h||  <  m*htoL 

Our  implementation  uses  stoptol  =  10“®  and  htol  =  10“®.  This  absolute 
tolerance  worked  well  for  our  test  problems,  a  relative  tolerance  criterion 
could  also  have  been  used. 

4.8  Handling  degeneracy 

It  is  well  known  that  Newton’s  method  can  face  computational  difficulties 
and  convergence  can  be  retarded  if  the  solution  is  degenerate,  i.e.  if  there 
exists  i  ^  1, . . . ,  n  such  that 

d?«)  =  0  and  A*)),-  =  0 

To  deal  with  the  above  degeneracy,  Va;/(x,A)  is  redefined  as  below  in  a 
manner  similar  to  Coleman  &  Li  [2]  : 

u  +  if|W(a:,A)X|  +  |di(xOI<kl 

“  I  (V^/(a;,A))i,  otherwise, 

where 

€  =  sign{{VJ{x,  X))i)iJ.t 

and,  say,  pc  =  10“®.  Note  that  choosing  stoptol  <  Hc  can  preclude  conver¬ 
gence. 

5  Observations  on  incorporating  inequality  con¬ 
straints 

As  was  stated  in  the  introduction,  this  formulation  can  be  used  to  handle 
smooth  nonlinear  inequality  constraints  by  adding  slack  variables  to  the 
inequalities.  Let  us  consider  the  general  nonlinear  programming  problem 
with  equalities  and  inequalities 


min  fix) 

X 

s.t.  c{x)  =  0 
g{x)  <  0 
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a  <  X  < 

which  is  equivalent  to 

min  f(x) 

X 

s.t.  c(x)  =  0 
ff(x)  +  s  =  0 
a  <  X  <  b 
s  >  0. 

Let  dim{g)  =  p,  the  number  of  inequalities,  and  dim(c)  =  k.  Now  the 
scaling  matrix  D{x)  becomes  D{x,s).  A  close  look  at  Va;,s/(a:,  s,  A)  shows 
that  using  our  previous  definitions,  the  last  p  components  of  the  vector 
d{x,s)  are  defined  as 


min{l,y^},  if  A„+i  >  0 
1,  otherwise 


for  1  =  1,..  .,p. 

A  quick  look  at  the  last  p  components  of  'V‘^  J{x,  s,  A)  =  0,  reveal  that  it  is 
only  an  expficit  restatement  of  complementarity.  Observing  that  A„+,- ,  i  = 
l,...,p,  are  just  the  multiphers  of  the  inequality  constraints  (commonly 
denoted  by  p),  the  above  definition  of  is  easily  identified  as  the 

mechanism  forcing  nonnegativity  of  the  Lagrange  multipliers  of  the  inequal¬ 
ity  constraints. 


Finally,  the  requirement  of  Vxh{x)  having  full  rank  gets  translated  into 
[Va;,sC(x),  4-  s]  having  full  rank.  Given  that  Vsc(a;)  =  Opxfc  and 

Vs5(x,s)  =  Ip,  it  becomes  equivalent  to  only  \'xc{x)  having  fuU  rank.  Note 
that  this  requirement  on  the  rank  stems  simply  from  the  issue  of  R  from 
the  QR  factorization  being  nonsingular,  and  not  from  satisfying  a  constraint 
qualification  at  the  solution.  Also  note  that  the  m  <  n  requirement  earlier 
now  becomes  p  +  k  <  n  +  p,  which  is  the  same  as  <  n. 


6  Some  Computational  Observations 

•  Speed  of  convergence  is  very  sensitive  to  v,  the  parameter  in  the  nor¬ 
mal  subproblem  which  determines  what  fraction  of  the  total  trust  re¬ 
gion  radius  would  be  allowed  for  the  horizontal  subproblem.  For  an 
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efficient  implementation  applicable  over  a  wide  range  of  problems,  a 
strategy  for  adapting  i'  based  on  the  iteration  history  should  certainly 
be  devised. 

•  Convergence  is  also  very  sensitive  to  the  initial  trust  region  radius. 

•  It  is  fairly  sensitive  to  the  trust  region  update  scheme.  Here  we  have 
presented  what  seemed  optimal  based’  on  our  computational  experi¬ 
ences. 

•  The  number  of  iterations^  is  greatly  reduced,  often  by  hundreds,  if  we 
do  not  reject  the  very  first  step  that  fails  to  satisfy  the  step  acceptance 
criterion  but  accept  it  instead. 

Below  are  results  of  computational  tests  using  a  Matlab  4.2a  implemen¬ 
tation  (on  a  SUN  OS  4.1.3  workstation)  on  a  few  test  problems.  The  best, 
followed  by  the  worst  performance  (for  the  few  parameter  settings  we  at¬ 
tempted)  are  tabulated  for  the  scaled  version,  unsealed  versions  1,  2,  &  3  and 
the  mixed  unsealed  version  along  with  their  corresponding  u  and  So-  Unless 
otherwise  mentioned,  by  default  the  initial  trust  region  radius  is  set  to 
2max{||a:o||,l},  p  =  1000, po  =  1  and  the  first  rejectable  step  is  accepted. 
The  stopping  criterion  is  as  mentioned  earlier.  Results  from  a  primal-dual 
interior  point  algorithm  and  Matlab’s  active  set  general  NLP  solver  constr 
are  also  provided  for  the  sake  of  comparison. 

Problem  1 


min  f{x)  =  xl  +  3x2  —  O.IX3X4  +.e  -f  (xs  -  2x2)^ 

X 

S.t.  Xi  -I-  2X2  +  4X3  -I-  6x4  +  7X5  =  0 
Xi  —  3x2  +  O.3X2X4  —  X5  =  0 
2xi  -I-  X2  —  O.lxf  =  0 
3xi  -f  4(x2  +  xs)^  =  25 
-10  <  Xi  <  10,i=  1,2,3,5 

-11  <  X4  <  10 

^  Which  is  exactly  the  same  as  the  number  of  function  and  gradient  evaluations  in  our 
way  of  counting. 
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Three  KKT  points^®  were  found  in  course  of  these  computations: 

•  [-2.2098,1-4782,10.0,-3.1898,-3.0868],  with  /*  =  49.2568  (referred 
to  as  KKT  pt.  1) 

•  [1.4793,  -0.6858, 10.0,  -9.9893, 2.8326]  with  /*  =  29.7818  (KKT  pt.  2) 

•  [0.0882,  -0.73,2.2183,0.8134,-1.7689],  with  /  =  —0.1921,  the  lowest 
function  value  among  the  three  (KKT  pt.  3) 

Starting  point:a;i  =  -6.3,  X2  =  1.0,  X3  =  1.0,  X4  =  0.55,  X5  =  1.0. 


Subproblem  type 

iterations 

1/ 

^0 

Point  of  convergence 

scaled 

9 

0.5 

default 

KKT  pt.  1 

14 

0.7 

default 

unsealed  v.l 

KKT  pt.  1 

46 

0.7 

default 

unsealed  v.2 

40 

KKT  pt.  1 

45 

0.7 

default 

unsealed  v.3 

53 

0.5 

default 

KKT  pt.  1 

69 

0.7 

default 

mixed  unsealed 

53 

0.5 

default 

KKT  pt.  1 

58 

0.7 

default 

KKT  pt.  2 

The  primal-dual  interior  point  method  with  essentially  line  search  global¬ 
ization  converged  in  31  iterations  to  KKT  point  2.  The  stopping  criterion 
for  the  primal  dual  algorithm  was 

||Vj;/(x,  A,//a,//6)||  +  ||h||  <  10“® 

Matlab’s  constr  converged  in  9  iterations  to  KKT  point  1. 

Starting  point  of  xi  was  changed  to  +6.3. 


^^Karush-Kuhn  Tucker  points,  i.e.,  points  satisfying  first  order  necessary  conditions  for 
optimality. 

^^The  implementation  we  used  might  not  be  the  best  available. 
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Subproblem  type 

iterations 

u 

^0 

Point  of  convergence 

scaled 

12 

0.6 

default 

KKT  pt.  2 

81 

0.45 

unsealed  v.l 

0.8 

mm 

KKT  pt.  3 

120 

0.8 

default 

KKT  pt.  2 

unsealed  v.2 

29 

0.8 

1.0 

KKT  pt.  3 

121 

0.8 

default 

KKT  pt.  2 

unsealed  v.3 

34 

0.7 

1.0 

KKT  pt.  3 

108 

0.75 

default 

KKT  pt.  2 

mixed  unsealed 

29 

0.8 

default 

KKT  pt.  3 

134 

0.6 

default 

KKT  pt.  2 

Primal-dual  interior  point,  as  before,  converged  in  31  iterations  to  the 
second  KKT  point. 

Matlab’s  constr  did  not  converge  in  500  iterations. 

Problem  2 

Same  as  the  first  problem,  except  that  the  fourth  equality  was  now  con¬ 
verted  to  an  inequality,  i.e., 

3x1  +  4{x2  +  xs)^  <  25 

and  a  slack  variable  was  added  to  get  it  in  our  form: 

3xi  -f  4(x2  -f  Xs)^  -1-  X6  =  25 


X6  >  0. 

Observe  that  the  previous  KKT  points  are  still  KKT  points  for  this  prob¬ 
lem  (with  the  slack  variable  xe  =  0.)  so  they  shall  be  referred  to  as  KKT 
points  1,  2  and  3  as  before. 

Starting  point:  [6.3, 1,  1,  0.55, 1,  1]  (last  variable  is  a  slack): 
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Subproblem  type 

iterations 

V 

^0 

Point  of  convergence 

scaled 

31 

0.6 

1.0 

KKT  pt.  2 

>  500 

0.65 

default 

n/a 

unsealed  v.l 

46 

0.8 

1.0 

KKT  pt.  3 

95 

0.65 

default 

KKT  pt.  2 

unsealed  v.2 

43 

0.8 

1.0 

KKT  pt.  3 

99 

0.65 

default 

KKT  pt.  2 

unsealed  v.3 

148 

0.5 

100 

KKT  pt.  3 

256 

0.65 

1.0 

KKT  pt.  2 

mixed  unsealed 

55 

0.65 

default 

KKT  pt.  3 

>  500“ 

0.6 

default 

n/a 

“Converges  in  49  iterations  to  KKT  pt.  3  if  the  second  merit  function  is  dropped. 


Primal-dual  interior  point  converges  in  48  iterations  to  the  second  KKT 
point. 


Matlab’s  constr  converges  in  101  iterations  to  point  3. 
Starting  point  changed  to  [-9,  ^9,  -9,  —9,  -9, 1]: 


Subproblem  type 

iterations 

V 

^0 

Point  of  eonvergenee 

sealed 

50 

0.8 

1.0 

KKT  pt.  3 

141 

0.65 

default 

unsealed  v.l 

100 

0.6 

default 

KKT  pt.  3 

119 

0.65 

default 

unsealed  v.2 

98 

0.6 

default 

KKT  pt.  3 

117 

0.65 

default 

unsealed  v.3 

103 

0.8 

default 

KKT  pt.  3 

119 

0.5 

default 

mixed  unsealed 

104 

0.6 

default 

KKT  pt.  3 

129 

0.65 

default 

Primal-dual  interior  point  does  not  converge  in  800  iterations. 
Matlab  takes  22  iterations  to  converge  to  point  3. 

Starting  point  changed  to  [9.5, 9.5, 9.5, 9.5, 9.5,1]: 
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Subproblem  type 

iterations 

V 

^0 

Point  of  convergence 

scaled 

39 

0.7 

1.0 

KKT  pt.  2 

79 

1.0 

unsealed  v.l 

103 

1.0 

KKT  pt.  3 

125 

0.65 

default 

unsealed  v.2 

140 

0.8 

default 

KKT  pt.  2 

183 

0.8 

1.0 

KKT  pt  3 

unsealed  v.3 

69 

0.8 

default 

KKT  pt.  3 

87 

0.65 

default 

mixed  unsealed 

69 

0.8 

default 

KKT  pt.  3 

87 

0.65 

default 

Primal-dual  converges  in  68  iterations  to  the  second  KKT  point. 


Matlab  converges  to  KKT  point  2  in  74  iterations. 

Problem  3 

The  last  three  equalities  in  problem  1  were  converted  to  <  inequalities 
and  slacks  were  added  to  them. 

All  the  convergent  runs  of  our  algorithms  converged  to  only  one  KKT 
point:  [-0.0131,  -0.8609, 1.6511, 1.1007,-1.6390, 0.8687, 0.4467, 0]  (last  three 
slack  variables)  with  /  =  -0.3921,  hence  the  point  of  convergence  is  omitted 
from  the  table.  The  first  two  inequalities  are  inactive  and  the  last  one  is 
active. 

Starting  point:  [2,  6,  6,  -6,  -6,  2,  1,  1]  (last  three  variables  are  slacks  on 
the  inequalities) 
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Subproblem  type 

iterations 

V 

^0 

scaled 

>  500 

a]l“ 

def.,  1.0 

unsealed  v.l 

187 

default 

224 

1.0 

unsealed  v.2 

202 

0.75 

1.0 

220 

0.65 

default 

unsealed  v.3 

>  500 

aU 

mixed  unsealed 

78 

0.7 

nmgiiiiii 

223 

0.65 

default 

*  Indicates  that  it  did  not  converge  for  ztll  the  parameter  settings  attempted 

Primal-dual  interior  point  did  not  converge  in  500  iterations. 
Matlab  converged  in  38  iterations. 

Starting  point  changed  to  [6.3, 1, 1, 0.55, 1, 2, 1, 1]. 


Subproblem  type 

iterations 

P 

^0 

scaled 

aU 

def.,  1.0 

unsealed  v.l 

167 

0.8 

default 

175 

0.65 

default 

unsealed  v.2 

166 

0.65 

>  500 

0.8 

unsealed  v.3 

191 

0.65 

default 

not  conv. 

other® 

mixed  unsealed 

82 

0.55 

10.0 

266 

0.65 

1.0 

“Did  not  converge  for  other  attempted  parameter  settings 

Primal-dual  interior  point  failed  to  converge  within  500  iterations. 
Matlab  converged  in  60  iterations. 

Problem  4 

(Hock  &  Schittkowski  test  problem  set,  no.  100) 

min  f{x) 

X 


26 


=  {xr-lQf+f>{x2-l2f+xi+Z{xi-nf+lQxl+lxl+x^-AxQX7-l{ixQ-^X7 

s.t,  2xj  4*  3^2  4"  ^3  4"  4X4  4"  5x5  ^  127 

7xi  +  3x2  4-  10x|  4-  X4  -  X5  <  282 

23xi  4-  X2  4-  6x|  -  8x7  <  196 

Ax\  4-  a;2  —  3xiX2  4-  2x§  4-  5x6  —  IIX7  <  0 

This  problem  was  converted  into  our  required  form  using  slack  variables, 
and  fix)  was  scaled  by  10“^  to  reduce  ill-conditioning. 

Convergent  runs  converged  to  only  one  KKT  point, 

[2.3309, 1.9459,  -0.4747, 4.3794,  -0.6238, 1.0372, 1.5954, 0, 252.5904, 144.9098, 0] 

with  /*  =  680.6,  the  first  and  last  inequalities  being  active,  and  the  second 
and  third  being  ‘highly  inactive’.  Instead  of  focussing  on  the  difference 
between  the  best  and  worst  number  of  iterations  due  to  changes  in  and 
V,  we  tabulate  representative  results  for  each  of  the  algorithms  below. 

Starting  point;  [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]  (last  four  slacks) 


Subproblem  type 

No.  of  iterations 

1/ 

^0 

scaled 

>  500“ 

unsealed  v.l 

238 

0.6 

default 

unsealed  v.2 

229 

0.6 

default 

unsealed  v.3 

iU-cond 

mixed  unsealed 

268 

0.55 

default 

“high  ill-conditioning  after  300  iterations 


The  primal-dual  interior  point  code  available  to  us  required  at  least  one 
upper  bound,  so  we  set  the  upper  bound  on  X2  to  10000  (which  is  not  active 
at  the  solution),  however  it  did  not  converge  within  500  iterations. 

Matlab’s  constr  converged  in  273  iterations. 

Problem  5 

(Hock  &  Schittkowski  test  problem  set,  no.  81) 

min  6^1^2^33:4x5  _  i/3.3  +  +  1)^ 

X  2 
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s.t.  ||x|p  =  10 
X2X3  —  5x4X5  =  0 

arf  +  xf  +  1  =  D 
—2.3  <  xi,X2  <  2.3 
—3.2  <  X3,X4,X5  <  3.2 

Eleven  distinct  KKT  points  were  found  in  course  of  the  numerical  ex¬ 
periments,  most  of  them  with  /*  =  1,  one  with  /*  =  0.0539  and  another 
with  /*  =  0.4389.  For  the  sake  of  elegance,  a  list  of  those  KKT  points  and 
thereof  is  skipped. 

Starting  point:  [1, 1, 1, 1, 1] 


Subproblem  type 

No.  of  iterations 

V 

^0 

scaled 

22 

0.65 

default 

unsealed  v.l 

39 

0.75 

default 

unsealed  v.2 

35 

0.75 

default 

unsealed  v.3 

42 

0.65 

default 

mixed  unsealed 

33 

Primal-dual  converges  in  49  iterations. 
Matlab  converges  in  96  iterations. 

New  starting  point:  [2,— 2,2, -2,2] 


Subproblem  type 

No.  of  iterations 

V 

^0 

scaled 

363 

0.8 

default 

unsealed  v.l 

306 

0.65 

default 

unsealed  v.2 

19 

0.65 

default 

unsealed  v.3 

77 

0.75 

1.0 

mixed  unsealed 

42 

0.65 

1.0 

Primal-dual  interior  point  does  not  converge  in  500  iterations,  and  nei¬ 
ther  does  Matlab ’s  constr. 
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6.1  Inferences 

Since  the  range  of  problems  tested  here  was  fairly  restricted,  no  attempts  will 
be  made  to  make  sweeping  generalizations  based  on  the  above  observations. 
It  does  seem  however  that  the  scaled  version  fails  to  perform  well  in  the 
presence  of  nonlinear  equalities,  though  it  does  remarkably  well  otherwise. 
The  sensitivities  of  the  convergence  to  i/  and  were  among  the  two  chief 
things  we  aimed  to  and  did  demonstrate.  The  algorithms  are  stiU  under 
development  and  further  testing  is  in  progress.  The  mixed  unsealed  version 
is  arguably  the  most  robust  of  the  algorithms  and  will  be  the  prime  focus 
of  further  development.  Our  forthcoming  implementations  in  Fortran  and 
C  will  use  DGQTPAR  to  solve  the  subproblems,  and  could  provide  further 
improvements  on  the  performance  reported  here,  since  this  implementation 
only  uses  dogleg  steps. 

6.2  Advantages 

The  chief  advantages  of  this  algorithm  are  over  other  algorithms  for  the 
general  NLP  are  as  follows: 

•  Solves  the  ‘sticking  problem’:  Many  primal-dual  interior  point  algo¬ 
rithms  have  the  undesirable  property  that  some  of  the  bounded  vari¬ 
ables  often  get  ‘attracted’  to  the  wrong  bounds  and  when  the  iterates 
reach  the  bounds  they  ‘stick’  to  those  bounds  which  do  not  corre¬ 
spond  to  the  solution,  and  hence  convergence  is  hindered. This  is 
one  problem  that  is  apparently  solved  by  the  Coleman-Li  scaling,  since 
the  scaled  step  is  sufficiently  ‘angled  away’  from  approaching  bounds. 
The  starting  points  for  several  of  our  tests  were  very  close  to  the  vari¬ 
able  bounds,  but  the  ‘sticking  phenomenon’  never  occurred.  Similar 
observations  are  also  made  in  Coleman  &  Li  [1],  [2]  and  in  Vicente  [3]. 

•  Reduces  the  number  of  variables:  Since  this  algorithm  never  computes 
or  requires  the  multipliers  of  the  bound  constraints,  the  number  of 
variables  is  greatly  reduced  compared  to  primal-dual  interior  point 
methods  or  active  set  algorithms. 

•  Advantage  over  active  set  algorithms:  This  algorithm  wins  over  active 
set  algorithms  in  the  same  manner  as  any  other  interior  point  algo¬ 
rithm.  In  particular,  it  is  well-known  that  active  set  strategies  can  be 

^^the  reason  for  this  becomes  clear  on  examining  the  Newton  step  on  the  complemen¬ 
tarity  conditions. 
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very  inefficient  in  the  presence  of  a  large  number  of  bounds  and/or 
a  large  number  of  inequality  constraints,  especially  when  most  of  the 
bounds  or  inequality  constraints  are  not  active.  Further,  an  active 
set  strategy  can  identify  the  wrong  active  set  and  converge  to  a  point 
where  some  of  the  inequalities  or  bounds  are  not  satisfied. 


7  Conclusion 

An  interior  point  algorithm  for  solving  the  general  nonlinear  programming 
problem  using  trust  region  globalization  was  presented.  It  is  quite  likely  that 
it  would  be  possible  to  rigorously  prove  under  reasonable  assumptions  global 
convergence  of  the  iterates  and  local  q-quadratic  convergence  in  the  presence 
of  exact  second-order  information  or  local  q-superlinear  convergence  using 
a  secant  aproximation.  Our  aim  in  this  paper,  in  addition  to  motivating 
theoretical  analyses  of  this  algorithm,  was  to  extract  from  computational 
experiments  ingredients,  perhaps  heuristics,  needed  to  expedite  the  semi¬ 
local  convergence  of  this  algorithm.  With  more  computational  tests  on  a 
wider  set  of  problems  in  progress,  the  author  hopes  that  this  treatise  would 
attract  other  computational  scientists  and  theoreticians  to  study  this  algo¬ 
rithm  in  further  depth.  In  fact,  Dr.  Mahmoud  El-Alem  is  currently  involved 
in  proving  convergence  for  these  algorithms. 

8  Appendix  A:  Non- equivalence  of  subproblems 
in  scaled  and  unsealed  steps 

Consider  Coleman  &  Li’s  scaled  and  unsealed  subproblems  in  [1],  rewrit¬ 
ten  in  our  notation  {B  is  an  approximation  to  V^/(a;),  D  =  D{x),  and 
J  =  diag{V  xf{x))diag{r]{x)): 

Scaled  subproblem 

min  \f[DBD  +  J]s  +  [DVJ(x)fs 

s  2 

s.t.  pll  <  6 

Unsealed  subproblem 

min  -F  JD-^)s  +  V^f{xfs 

^  2 
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s.t.  \\D  ^s||  <  6. 

Let  the  multipliers  of  the  trust-region  inequalities  for  the  two  subproblems 
be  respectively  fl  and  fi.  Suppose  the  Newton  steps  are  outside  the  trust 
regions,  so  that  corresponding  to  the  optimal  steps,  ix  >  0  and  p,  >  0  and 
the  trust  region  inequalities  are  active.  Suppose  there  does  not  exist  an 
eigenvalue  of  equal  to  |  that  has  s  as  its  corresponding  eigenvector. 
Then 

Ds  5^  s 

Proof: 

Recalling  the  necessary  condition  for  optimality,  the  optimal  step  from  the 
scaled  subproblem  must  satisfy 

(^DBD  J 

=  {DB  +  JD-^  -I-  pD-'^)Ds  =  -DWxf{x). 

Since  /,  D  are  diagonal,  we  have 

{B  +  JD~^  +  pD~^)Ds  =  -S/xf{x).  (9) 

Similarly,  the  optimal  step  from  the  unsealed  subproblem  satisfies 

=  {B  +  JD-^  +  fi)s  =  -Vxf{x).  (10) 

The  proof  now  follows  by  contradiction.  Assume  that  Ds  =  s.  Then 
substituting  s  for  Ds  in  (9)  and  subtracting  (9)  from  (10)  we  get 

{n  -  pD~^)s  =  0. 

Thus,  since  //  >  0 

=  L. 

IJ- 

But  the  above  cannot  hold  since,  by  hypothesis,  |  is  not  an  eigenvalue  of 
with  corresponding  eigenvector  s. 

Hence,  by  contradiction, 

Ds  ^  s  {QED) 

It  is  noteworthy  that  since  is  diagonal  with  nonzero  entries,  its  eigen¬ 
vectors  are  the  canonical  basis  vectors  ei  of  ,  hence  for  a  step  s  to  even 
qualify  as  an  eigenvector  oi  would  mean  that  it  must  have  only  one 
nonzero  element. 
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